165 research outputs found

    Distance-Based Independence Screening for Canonical Analysis

    Full text link
    This paper introduces a new method named Distance-based Independence Screening for Canonical Analysis (DISCA) to reduce dimensions of two random vectors with arbitrary dimensions. The objective of our method is to identify the low dimensional linear projections of two random vectors, such that any dimension reduction based on linear projection with lower dimensions will surely affect some dependent structure -- the removed components are not independent. The essence of DISCA is to use the distance correlation to eliminate the "redundant" dimensions until infeasible. Unlike the existing canonical analysis methods, DISCA does not require the dimensions of the reduced subspaces of the two random vectors to be equal, nor does it require certain distributional assumption on the random vectors. We show that under mild conditions, our approach does undercover the lowest possible linear dependency structures between two random vectors, and our conditions are weaker than some sufficient linear subspace-based methods. Numerically, DISCA is to solve a non-convex optimization problem. We formulate it as a difference-of-convex (DC) optimization problem, and then further adopt the alternating direction method of multipliers (ADMM) on the convex step of the DC algorithms to parallelize/accelerate the computation. Some sufficient linear subspace-based methods use potentially numerically-intensive bootstrap method to determine the dimensions of the reduced subspaces in advance; our method avoids this complexity. In simulations, we present cases that DISCA can solve effectively, while other methods cannot. In both the simulation studies and real data cases, when the other state-of-the-art dimension reduction methods are applicable, we observe that DISCA performs either comparably or better than most of them. Codes and an R package can be found in GitHub https://github.com/ChuanpingYu/DISCA

    Adaptive multiscale detection of filamentary structures in a background of uniform random points

    Full text link
    We are given a set of nn points that might be uniformly distributed in the unit square [0,1]2[0,1]^2. We wish to test whether the set, although mostly consisting of uniformly scattered points, also contains a small fraction of points sampled from some (a priori unknown) curve with CαC^{\alpha}-norm bounded by β\beta. An asymptotic detection threshold exists in this problem; for a constant T(α,β)>0T_-(\alpha,\beta)>0, if the number of points sampled from the curve is smaller than T(α,β)n1/(1+α)T_-(\alpha,\beta)n^{1/(1+\alpha)}, reliable detection is not possible for large nn. We describe a multiscale significant-runs algorithm that can reliably detect concentration of data near a smooth curve, without knowing the smoothness information α\alpha or β\beta in advance, provided that the number of points on the curve exceeds T(α,β)n1/(1+α)T_*(\alpha,\beta)n^{1/(1+\alpha)}. This algorithm therefore has an optimal detection threshold, up to a factor T/TT_*/T_-. At the heart of our approach is an analysis of the data by counting membership in multiscale multianisotropic strips. The strips will have area 2/n2/n and exhibit a variety of lengths, orientations and anisotropies. The strips are partitioned into anisotropy classes; each class is organized as a directed graph whose vertices all are strips of the same anisotropy and whose edges link such strips to their ``good continuations.'' The point-cloud data are reduced to counts that measure membership in strips. Each anisotropy graph is reduced to a subgraph that consist of strips with significant counts. The algorithm rejects H0\mathbf{H}_0 whenever some such subgraph contains a path that connects many consecutive significant counts.Comment: Published at http://dx.doi.org/10.1214/009053605000000787 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Adjusted Wasserstein Distributionally Robust Estimator in Statistical Learning

    Full text link
    We propose an adjusted Wasserstein distributionally robust estimator -- based on a nonlinear transformation of the Wasserstein distributionally robust (WDRO) estimator in statistical learning. This transformation will improve the statistical performance of WDRO because the adjusted WDRO estimator is asymptotically unbiased and has an asymptotically smaller mean squared error. The adjusted WDRO will not mitigate the out-of-sample performance guarantee of WDRO. Sufficient conditions for the existence of the adjusted WDRO estimator are presented, and the procedure for the computation of the adjusted WDRO estimator is given. Specifically, we will show how the adjusted WDRO estimator is developed in the generalized linear model. Numerical experiments demonstrate the favorable practical performance of the adjusted estimator over the classic one

    Classification of Data Generated by Gaussian Mixture Models Using Deep ReLU Networks

    Full text link
    This paper studies the binary classification of unbounded data from Rd{\mathbb R}^d generated under Gaussian Mixture Models (GMMs) using deep ReLU neural networks. We obtain \unicode{x2013} for the first time \unicode{x2013} non-asymptotic upper bounds and convergence rates of the excess risk (excess misclassification error) for the classification without restrictions on model parameters. The convergence rates we derive do not depend on dimension dd, demonstrating that deep ReLU networks can overcome the curse of dimensionality in classification. While the majority of existing generalization analysis of classification algorithms relies on a bounded domain, we consider an unbounded domain by leveraging the analyticity and fast decay of Gaussian distributions. To facilitate our analysis, we give a novel approximation error bound for general analytic functions using ReLU networks, which may be of independent interest. Gaussian distributions can be adopted nicely to model data arising in applications, e.g., speeches, images, and texts; our results provide a theoretical verification of the observed efficiency of deep neural networks in practical classification problems

    Learning Ability of Interpolating Deep Convolutional Neural Networks

    Full text link
    It is frequently observed that overparameterized neural networks generalize well. Regarding such phenomena, existing theoretical work mainly devotes to linear settings or fully-connected neural networks. This paper studies the learning ability of an important family of deep neural networks, deep convolutional neural networks (DCNNs), under both underparameterized and overparameterized settings. We establish the first learning rates of underparameterized DCNNs without parameter or function variable structure restrictions presented in the literature. We also show that by adding well-defined layers to a non-interpolating DCNN, we can obtain some interpolating DCNNs that maintain the good learning rates of the non-interpolating DCNN. This result is achieved by a novel network deepening scheme designed for DCNNs. Our work provides theoretical verification of how overfitted DCNNs generalize well
    corecore